2        Bioinformatics

base pair of guanine (C/G)) as shown in Figure 1.1. Adenine and thymine form two hydro-

gen bonds (weak bond), while cytosine and guanine form three hydrogen bonds (strong

bond). Those base pairings are specific so that a sequence of a strand is predicted from the

other one. The length of a DNA sequence is given in base pair (bp), kilobase pair (kbp),

or megabase pair (Mbp). The RNA exists in a single strand; however, it sometimes forms

double-stranded secondary structure with itself to perform specific function.

The genome of an organism is the book of life for that organism. It determines the living

aspects and biological activities of cells. A genome contains coding regions known as genes

that carry information for protein synthesis. Genes are transcribed into messenger RNA

(mRNA), which is translated into proteins and the proteins control most of the biological

processes in the living organisms.

A gene consists of coding regions, non-coding regions, and a regulatory region. The cod-

ing regions in the eukaryotic genes are not continuous, but non-coding sequences (called

introns) are found between the coding sequences (called exons). These introns are removed

from the transcribed transcripts before protein translation, leaving only the exons which

form the coding region called the open reading frame (ORF). Each eukaryotic gene has its

own regulatory region that controls its expression. In prokaryotic cells, a group of genes,

called an operon, are regulated by a single regulatory region. The viruses, which fall in

the margin between living organisms and chemical particles, function and replicate only

inside host cells by using the host cells machineries such as ribosomes to create structural

and non-structural proteins of viruses and to replicate to create new virions.

N

N

N

N

N

N

N

N

N

N

Guanine

Cytosine

Thymine

Adenine

N

N

N

N

O

O

O

O

N

H

H

H

H

H

H

H

H

FIGURE 1.1  Base pairing and hydrogen bonds between pairs of the DNA nucleotides.